Modularity clustering is force-directed layout 
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Two natural and widely used representations for the community structure of networks are clus- 
terings, which partition the vertex set into disjoint subsets, and layouts, which assign the vertices 
to positions in a metric space. This paper unifies prominent characterizations of layout quality 
and clustering quality, by showing that energy models of pairwise attraction and repulsion subsume 
Newman and Girvan's modularity measure. Layouts with optimal energy are relaxations of, and are 
thus consistent with, clusterings with optimal modularity, which is of practical relevance because 
both representations are complementary and often used together. 
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I. INTRODUCTION 

Many systems of scientific or practical interest are de- 
composable into subsystems with strong internal and rel- 
atively weak external interactions 1]; for example, there 
are groups of friends or collaborators in social networks, 
sets of topically related documents in hypertexts, or blocs 
of interlocked countries in international trade. If systems 
are modeled as networks, with the system elements as 
vertices and their interactions as edges, then each sub- 
system corresponds to a so-called community, a set of 
vertices with dense internal connections but sparse con- 
nections to the remaining network. 

Two widely used representations of networks are lay- 
outs, which assign the vertices to positions in a metric 
space, and clusterings, which partition the vertex set into 
disjoint subsets. Both representations can group densely 
connected vertices, by placing them at nearby positions 
or in the same cluster, and separate sparsely connected 
vertices, by placing them at distant positions or in differ- 
ent clusters, and can thus naturally reflect the commu- 
nity structure. Requirements like the grouping of densely 
connected vertices are often formalized as mathematical 
functions called quality measures, and the optimization 
of quality measures is a common strategy for the compu- 
tation of both layouts 0, Q and clusterings [J, [H, • 
Despite these commonalities, and although layouts and 
clusterings are often used together as complementary rep- 
resentations of the same network, there is no coherent 
understanding of layout quality and clustering quality. 

This paper unifies Newman and Girvan's modular- 
ity Q, a popular quality measure for clusterings, with 
energy models of pairwise attraction and repulsion be- 
tween vertices (e.g., @, Q), a widely used class of qual- 
ity measures for layouts. After an introduction of the 
quality measures in Sec. HH Sec. IIIII shows that layouts 
with optimal energy and clusterings with optimal mod- 
ularity represent the community structure similarly, and 
Sec. IIVI demonstrates that modularity actually is an en- 
ergy model of pairwise attraction and repulsion, if clus- 
terings are considered as restricted layouts. Section [V] 
discusses the application of these results for computing 
consistent clusterings and layouts. 



II. ENERGY MODELS AND MODULARITY 

Quality measures for representations of networks for- 
malize what is considered as a good representation, and 
allow to compute good representations automatically us- 
ing optimization algorithms. Mathematically, a quality 
measure maps network representations to real numbers, 
such that larger (or smaller) numbers are assigned to 
better representations, and the best representations cor- 
respond to maxima (or minima) of the measure. This sec- 
tion introduces two widely used quality measures, namely 
energy models based on pairwise attraction and repulsion 
for layouts, and Newman and Girvan's modularity mea- 
sure for clusterings. 

To obtain uniform and general formulations, both mea- 
sures are defined for weighted networks. In a weighted 
network, each vertex v has a nonnegative real vertex 
weight w Vl and each unordered vertex pair {u, v} (in- 
cluding u = v) has a nonnegative real edge weight uis UtV \. 
Intuitively, a vertex (or edge) of weight k can be thought 
of as a chunk of k vertices (or edges) of weight 1. The 
commonly studied unweighted networks correspond to 
the special case where the edge weights are either (no 
edge) or 1, and the vertex weights are 1. 



A. The (a, r)-energy model for layouts 



A d-dimensional layout p of a network maps each ver- 
tex v to a position p v in M. d ; it thereby assigns a distance 
to each vertex pair {u, v}, namely the Euclidean distance 
||Pu — Pv\\ between the respective vertex positions. So- 
called energy models are an important class of quality 
measures for layouts. In general, smaller energy indicates 
better layouts. Because force is the negative gradient of 
energy, energy models can also be represented as force 
systems, and energy minima correspond to force equilib- 
ria. For introductions to energy-based or force-directed 
layout, see Refs. @, [|[. 

The most popular energy models for general undirected 
networks are either similar to stress functions of multidi- 
mensional scaling or represent force systems of pair- 
wise attraction and repulsion between vertices. Mod- 



2 



els of the former type (e.g., enforce that the dis- 

tance of each vertex pair in the layout approximates some 
prespecified distance, most commonly the length of the 
shortest edge path between the vertices. They will not 
be further discussed, because their layouts reflect these 
path lengths rather than the community structure. 

In models of the latter type, adjacent vertices attract, 
which tends to group densely connected vertices, and all 
pairs of vertices repulse, which tends to separate sparsely 
connected vertices. The strengths of the forces are often 
chosen to be proportional to some power of the distance. 
Formally, for a layout p and two vertices u, v with u=/=v, 
the attractive force exerted on u by v is 

W{u,v} \\Pu-Pv\\ a PuPl , 

and the repulsive force exerted on u by v is 

W U W V \\p u -Pv\\ r PvPu , 

where \\p u — p v \\ is the distance between u and v, p u p v is 
the unit-length vector pointing from u to v, and a and r 
are real constants with a> r. 

The condition a > r ensures that the attractive force 
between connected vertices grows faster than the repul- 
sive force, and thus prevents infinite distances except be- 
tween unconnected components. For most practical force 
models holds a > and r < 0, i.e., the attractive force is 
non-decreasing and the repulsive force is non-increasing 
with growing distance. In the widely used force model of 
Fruchterman and Reingold [ll[, a = 2 and r = — 1. 

By exploiting that force is the negative gradient of en- 
ergy, the force model can be transformed into an energy 
model, such that force equilibria correspond to (local) 
energy minima. For a layout p and constants a, r 6 K 
with a > r, the (a, r) -energy is 
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{u,v}: u^v 
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\\Pu-Pv\\ 



r+1 



where 



\\Pu-Pv\ 



-1+1 



(1) 

must be read as In \\p u — p v \\ (because 
x ~ l is the derivative of hire). The (1, — 3)-energy model 
has been proposed by Davidson and Harel [T^, and the 
(0, — l)-energy model is known as LinLog model p^L[I3|. 



B. The modularity measure for clusterings 

A clustering p of a network partitions the vertex set 
into disjoint subsets called clusters, and thereby maps 
each vertex v to a cluster p v . Proposals of quality mea- 
sures for clusterings are numerous and scattered over the 
literature of diverse research fields: surveys, though non- 
exhaustive, are provided by Refs. 0, d, [lj, . 

One of the most widely used quality measures was in- 
troduced by Newman and Girvan, and is called modular- 
ity. It was originally defined for the special case where 
the edge weights are either or 1 and the weight of each 



vertex is its degree [8|, and was later extended to net- 
works with arbitrary edge weights [HI ]. (The degree of 
a vertex is the total weight of its incident edges, with 
the edge weight from the vertex to itself counted twice.) 
Generalized to arbitrary vertex weights, the modularity 
of a clustering p is 



cep(v) 



/ W{c,c} 



V,V} 



I w 2 

2 c 
2 W V 



(2) 



where V is the set of all vertices in the network, and p(V) 
is the set of clusters; the weight functions are naturally 
extended to sets of vertices or edges: u>{ C;C } is the to- 
tal edge weight within the cluster c, and w c is the total 
weight of the vertices in c. 

Intuitively, the first term of the modularity measure is 
the actual fraction of intra-cluster edge weight. In itself, 
it is not a good measure of clustering quality, because it 
takes the maximum value for the trivial clustering where 
one cluster contains all vertices. This is corrected by 
subtracting a second term, which specifies the expected 
fraction of intra-cluster edge weight in a network with 
uniform density. Thus modularity takes positive values 
for clusterings where the total edge weight within clusters 
is larger than would be expected if the network had no 
community structure. 



C. Optimization algorithms 

Finding a minimum-energy layout or a maximum- 
modularity clustering of a given network is computation- 
ally hard; in particular, modularity maximization was re- 
cently shown to be NP-complete [17[. In practice, energy 
and modularity are almost exclusively optimized with 
heuristic algorithms that do not guarantee to find op- 
timal or near-optimal solutions. 

An extensive experimental comparison of energy mini- 
mization algorithms for network layout was performed by 
Hachul and Jiinger [13]; however, most of the examined 
algorithms make fairly restrictive assumptions about the 
optimized energy model. More general and reasonably 
efficient is the force calculation algorithm by Barnes and 
Hut [19(, whose runtime is 0(m + nlogn) per iteration 
for a network with m edges (with nonzero weight) and 
n vertices (assuming that the number of dimensions is 
small and the vertex distances are not extremely nonuni- 
form) . The number of iterations required for convergence 
typically grows sublinearly with n. 

Clustering algorithms for networks are surveyed in 
Refs. [1, [1,0, S]. A relatively fast yet very effective 
heuristic for modularity maximization is agglomeration 
by iteratively merging clusters (starting from singletons), 
combined with single-level [2l[ or multi-level [22( refine- 
ment by iteratively moving vertices; an efficient imple- 
mentation requires a runtime of 0(m log 2 n) (assuming 
O(logn) hierarchy levels in agglomeration and O(logn) 
iterations through all vertices per level in refinement). 
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III. ENERGY MODELS AND MODULARITY 
REVEAL COMMUNITIES 



A set of vertices is called a community if the density 
within the set is significantly larger than the density be- 
tween the set and the remaining network. The density 
between two disjoint sets of vertices T and U is intu- 
itively the quotient of the actual edge weight and the 
potential edge weight between T and U; formally, it is 
defined as {T ' U} , where wu is the total weight of the 
vertices in U , and w^ T U y is the total edge weight be- 
tween T and U. Similarly, the density within a vertex 
set U is .^"'"2 ■ (This generalizes standard definitions 
of density from graph theory [23[ to weighted networks 
with self-edges.) 

Existing theoretical results, which will be summarized 
and extended in this section, already show that the com- 
munity structure of a network is reflected in layouts with 
optimal (a, r)-energy (for certain values of a and r) and 
in clusterings with optimal modularity. What has previ- 
ously escaped notice is the striking analogy: The separa- 
tion of communities in an optimal layout is inversely pro- 
portional to (some power of) the density between them, 
and the separation of communities in an optimal cluster- 
ing reflects whether the density between them is smaller 
than a certain threshold. As an important limitation, 
the result for layouts will be derived only for two com- 
munities, and cannot be expected to hold precisely for 
more communities. Therefore, the consistency of (a, r)- 
energy layouts and modularity clusterings will be revis- 
ited in Sec. [V] after further evidence has been presented 
in Sec. [TV] 

In what appears to be the only previous work that 
formally relates energy-based layout to modularity clus- 
tering [14j, we did not established similarities between 
optimal layouts and optimal clusterings, but only noted 
that the modularity measure is mathematically similar 
to the density (called normalized cut in [HJ|), as both 
normalize the actual edge weight with a potential or ex- 
pected edge weight. 



A. Representation of community structure in 
layouts with optimal (a, r)-energy 

This subsection discusses how the distances in a layout 
with optimal (a, r)-energy can be interpreted in terms of 
the community structure of the network, and how this 
interpretation depends on the parameters a and r. 

For the simple case of a network with two vertices, the 
minimum-energy layouts can be computed analytically 
(Theorem 3 in If the vertices u and v have the 

distance d, the (a, r)-energy is 



thus 



U(d) 



d a+l 



d r+1 



— - w u w v — — 
1 r + 1 



U'(da) 
d 



W{u,v}do 



w u w v do 



w u w v 



(3) 



Thus the distance of the two vertices in a layout with 

optimal (a, r)-energy is the Mh power of the density 

between the vertices. In particular, the distance is the 
inverse density if a — r = l, and the distance is almost 
independent of the density if a — r 1 . This impact of 
fl-r on the representation of the community structure 
is illustrated for a larger network in Fig. [T] 



The derivative of this function is at its minimum do, 



FIG. 1: Layouts with small LinLog energy (a — r = l) and 
with small Fruchterman-Reingold energy (a — r = 3) of a 
pseudo-random network with eight clusters (intra-cluster den- 
sity 1.0, expected inter-cluster density 0.2). 

Replacing the edge {u, v} with two edges {u, t} and 
{t, v}, where t is a new vertex with weight 0, increases the 
optimal distance between u and v by a factor of 2 a ^ a ^ r \ 
Because the (a, r)-energy is only defined for a — r > 0, the 
factor is 1 if a = 0, and greater than 1 if a > 0. This result 
has a significant implication, given that the addition of t 
increases the path length between u and v (from 1 to 2 
edges) without changing the density: The optimal dis- 
tance of u and v depends only on the density, and not on 
the path length, if a — (as in the LinLog energy model), 
and increases with the path length if a > 0. 

The results for networks with two or three vertices can 
be generalized, at least as approximations, to larger net- 
works. In a network with clear communities, for exam- 
ple, the density within the communities is (by definition) 
much greater than the density between the communities, 
and thus the intra-community distances in an optimal 
layout are much smaller than the inter-community dis- 
tances (unless a — r is very large) . This can be approxi- 
mated by assuming that the vertices of each community 
have the same position, and thus by considering each 
community as one big vertex. For networks with more 
than two communities, Eq. ([3]) cannot be expected to 
hold precisely for all pairs of communities, because this 
would often imply distances that violate the triangle in- 
equality. Nevertheless, the qualitative reasoning general- 
izes: Distances are less dependent on densities for large 
a — r, and less dependent on path lengths for small a. 
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FIG. 2: Layouts with optimal (a, r)-energy for different values 
of a and r. All vertices and edges have weight 1, except for 
the small vertex between the triangles which has weight 0. 



Figure [2] illustrates the impact of the parameters a 
and r for two simple networks: For a — r > 1 (bottom 
right), the two triangles are less clearly separated than 
for a — r — l (bottom left and top right), and only for 
a = (left), the path length between the triangles does 
not affect their distance. 

Figure [3] summarizes the results of this subsection. 




Davidson-Harel 



FIG. 3: Impact of the parameters a and r on the optimal 
layouts of the (a,r)-energy model. 



which is positive if and only if 



W{ c ,d} w {v.y} 



w c w d 



2 "V 



i.e., if the density between c and d is greater than the 
density within the network. In a clustering with max- 
imum modularity, neither joining nor splitting clusters 
may increase the modularity, which yields the claim. 

These observations imply that the granularity of clus- 
terings with maximum modularity depends on the overall 
density within the network, which may be undesirable for 
some applications. For example, if the density within the 
network is sufficiently small, then two dense subnetworks 
connected by only one light-weight edge are joined into 
a single cluster, instead of forming two separate clus- 
ters 



251 ]. Similarly, doubling a network (by adding 



second copy of the same network) halves its density, and 
thus generally coarsens the optimal clustering instead of 
preserving it [17| • Because such granularity-related issues 
are specific to discrete representations like clusterings, 
they provide a major motivation for the supplementary 
(and sometimes even exclusive) use of continuous repre- 
sentations like layouts. 



IV. ENERGY SUBSUMES MODULARITY 

Modularity can be considered as a special case of(a,r)- 
energy. The first subsection formally derives this result, 
and the second subsection explains how this derivation 
is facilitated by the definitions of (a, r)-energy and mod- 
ularity in Sec. HH which generalize previous definitions 
from the literature. 



A. Transformation of modularity into (a, r )-energy 

The modularity of a clustering p was defined in 
Sec. El] as 



B. Representation of community structure in 
clusterings with optimal modularity 



w 



{c,c} 



±W 2 

2 c 

2 W V 



Reichardt and Bornholdt [24| observed that in a clus- 
tering with maximum modularity, the density between 
any two clusters is at most the density within the entire 
network, and the density between any two subclusters ob- 
tained by splitting a cluster is at least the density within 
the network. (Clusters may still have a smaller density 
than the network, essentially because vertices without 
self-edges decrease the density within their cluster but 
cannot be split.) The argument is simple: Joining two 
clusters c and d with c ^ d increases the modularity by 



W{c,d} 



W c Wd 



W{V,V} \Wy 



i.e., as the difference of the actual fraction of intra-cluster 
edge weight and the expected fraction of intra-cluster 
edge weight. 

Because each edge is either intra-cluster or inter- 
cluster, the fraction of intra-cluster edge weight and the 
fraction of inter-cluster edge weight add up to 1: 



E 



J{c,c} 



+ 



E 



}{c,d] 



1 



ce P {v) w ™ {c,d } c P (vy. c^d w ^ 



similarly, the corresponding expected fractions add up 
to 1 . Thus the modularity of p can be written in terms 



5 



of inter-cluster edge weights as 



E 

{c,d}Cp(V):cjtd 



W{c,d} 


! w c w d 


W {V,V} 






w u w v 


W{V,V} 





{u,v}C-V : p u ^Pv 



Let k be the number of clusters in p. Without changing 
the modularity of p, the fc clusters can be considered as 
positions in R fc_1 , such that each pair of different clusters 
has the distance 1. (Intuitively, the k clusters form the 
corners of a regular (fc — l)-simplex with edge length 1; a 
(k — l)-simplex is the (k — l)-dimensional analogue of a 
triangle.) Then the clustering p is a (fc — l)-dimensional 
layout, and the modularity of p can be rewritten as 



E 

{u,v}C1V: Pu^Pv 



V{u,v} 
>{V,V} 



\\Pu-Pv\ 



"j— 2" \\Pu-Pv\\ 
2 W V 



The condition p u ^ p v of the sum can be dropped or re- 
placed with uj^v, because it excludes only vertex pairs 
{u, v} with \\p u -p v \\ = 0. 

Because the distances between the vertices are or 1, 
the modularity of p equals 



E 

{u,v}: u=£v 



W{u,v} 
W {V,V} 



\\Pu-Px 



ia+1 



W U W V 
2 W V 



\\Pu-Pv 



ir+1 



for all a,r 6 R with a>— 1 and r>— 1. This is the 
negative (a, r)-energy, except for the constant factors in 
the attraction term and the repulsion term, which change 
only the scaling of the optimal layouts. 



B. Prerequisites of the transformation 

The transformation of modularity into (a, r)-energy in 
the previous subsection is based on the definitions of the 
measures in Sec. H2 which generalize previous definitions 
from the literature in several respects. 

First, the goal of most energy-based layout techniques 
is to produce easily readable box-and-line visualizations, 
which differs from and even conflicts with producing 
faithful representations of the community structure. The 
classic energy models of Eades (2(|, Fruchterman and 
Reingold [ll[, and Davidson and Harel [l2j primarily re- 
ward the conformance to aesthetic criteria like small edge 
lengths and uniformly distributed vertices, and thus of- 
ten prevent the clear separation of sparsely connected 
vertices and the clear grouping of densely connected ver- 
tices (see Fig. [IJ. The design and evaluation of energy 
models with the explicit purpose of representing the com- 
munity structure started only recently with the LinLog 
model [H, [13]. Technically, the classic energy models 
are, or are similar to, instances of the (a, r)-energy model 
where the difference a — r is fixed and too large; the (a, r)- 
energy model is parameterized with this difference. 



Second, most existing energy models are designed to 
strongly discourage the placement of several vertices on 
the same position, while clusterings may place many ver- 
tices in the same cluster. Technically, existing energy 
models are not mathematically equivalent to modular- 
ity because the exponent of the distance in the repulsion 
energy is fixed and too small; the (a, r)-energy model is 
parameterized with this exponent. 

Third, the modularity measure and most energy mod- 
els were originally defined for networks without vertex 
weights. The vertices are implicitly weighted with I in 
most classic energy models (e.g., [HI El) HI] ) i an d with 
their degree in the original modularity measure Q . It was 
only recently observed that degree-weighting may also 
improve the readability and interpretability of energy- 
based layouts [3, H3] • The definitions of (a, r)-energy 
and modularity in Sec. [IT] are generalized to arbitrary 
vertex weights, and thus subsume both degree weights 
and unit weights. 



C. Related work 

In the analysis of dissimilarity matrices, the compu- 
tation of clusterings and layouts with identical quality 
measures is fairly common (e.g., [28], H^]). The trick is 
to represent both clusterings and layouts of dissimilarity 
matrices as dissimilarity matrices: The dissimilarity of 
two objects in a layout can be defined as their Euclidean 
distance (as for networks), and the dissimilarity of two 
objects in a clustering can be defined as the average dis- 
similarity of the objects in their clusters (unlike for net- 
works, which specify no dissimilarities for their vertices). 
With this common representation of clusterings and lay- 
outs, it is easy to design common quality measures. 

For networks, there appear to be no previous propos- 
als of using identical quality measures for both cluster- 
ings and layouts. Some clustering algorithms compute 
layouts as intermediate results, for example eigenvector- 
based heuristics for modularity clustering [30, [3l| and 
approximation algorithms for some related partitioning 
problems [H, [H, [13] , but these layouts are not intended 
to be useful on their own. 



V. OPTIMAL-ENERGY LAYOUTS CONFORM 
TO OPTIMAL-MODULARITY CLUSTERINGS 

Clusterings and layouts complement each other as rep- 
resentations for the community structure of networks. 
Layouts are limited to two or three dimensions in prac- 
tice, and thus cannot faithfully represent inherently high- 
dimensional structures, but they may show crucial details 
that are missing in clusterings: 

• the density between clusters, and more generally, 
the relationship between clusters, e.g., whether 
their separation is clear or fuzzy, and which ver- 
tices form their interface, 
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• the density within clusters, and more generally, the 
internal structure of clusters, e.g., whether a dense 
cluster is composed of even denser subclusters, 

• the density between vertices and clusters, e.g., 
whether a vertex is central or peripheral to its clus- 
ter, or whether the assignment of a vertex to a clus- 
ter is rather arbitrary because it is closely related 
to several other clusters. 

However, a layout only permits these interpretations if 
it is consistent with the respective clustering, i.e., if the 
layout and the clustering group the vertices according 
to the same criteria. In previous works, some authors 
nonetheless consider vertex groups in arbitrary force- 
directed layouts as clusters, while others rightly note that 
they have no reasons to suppose that such interpretations 
are valid. Sections IIIII and IIVI finally provide such rea- 
sons, as summarized in the following subsection. 



A. Evidence 

Section ITVl showed that for clusterings with k clusters, 
considered as restricted (k — l)-dimensional layouts, the 
(a, r)-energy model is equivalent to the modularity mea- 
sure if a > — 1 and r > — 1. Thus (unrestricted) layouts 
with optimal (a, r)-energy are relaxations of clusterings 
with optimal modularity if (a) the layouts have at least 
k — 1 dimensions, and (b) a > — 1 and r > — 1. 

Concerning condition (a), the dimensionality of lay- 
outs can be somewhat reduced without large changes 
of the pairwise vertex distances, and thus without large 
changes of the (a, r)-energy. Hence the consistency of 
optimal layouts and optimal clusterings does not break 
down immediately if the layout has less dimensions than 
the clustering has clusters. 

Condition (b) does not imply that layouts with opti- 
mal (a,r)-energy closely resemble clusterings with opti- 
mal modularity precisely for a > — 1 and r > — 1. On the 
one hand, the condition r > — 1 is necessary for cluster- 
ings to permit the assignment of several vertices to the 
same cluster, but not for layouts which may group ver- 
tices without placing them on exactly the same position. 
On the other hand, the precise values of a and r hardly 
matter for clusterings where the distance between ver- 
tices is either or 1, but were shown to be important for 
layouts in Sec. IIIII Considering the results of Sec. IIIII 
(a,r)-energy layouts most closely resemble modularity 
clusterings if 

• a > r, a > 0, and r < (by the definition of (a, r)- 
energy), 

• a«0, such that distances do not reflect path 
lengths, and 

• a — r » 1, or at least a — r 1, such that distances 
reflect densities. 



B. Examples 

The purpose of this subsection is to illustrate the con- 
sistency of (a, r)-energy layouts and modularity cluster- 
ings, and the benefits of this consistency, for several real- 
world networks. It should be stressed that the purpose is 
not to validate the (a, r)-energy model or the modularity 
measure, which are already widely used and discussed in 
many previous works; and the purpose is not to prove 
the consistency of (a, r)-energy layouts and modularity 
clusterings, because the mathematical evidence summa- 
rized in the previous subsection is more general than any 
number of examples. 

The example networks are listed in Tablefl] The weight 
of each vertex is set to its degree, as in the original mod- 
ularity measure Q and in the edge-repulsion LinLog en- 
ergy model (l4j . In visualizations, each vertex is repre- 
sented as a box, its degree (weight) as area of the box, 
and its cluster membership as shape of the box. 

TABLE I: Example networks 
Name Size Source 

Karate Club 34 [35 . Figure 3]; unweighted version 

used in [8, 17 . 361 
Book Co-Purchase 105 V. Krebs, provided M. Newman"; 

also used in [17. 361 
Food Classification 45 [37], published in [38, Table 5.1] 
World Trade 66 World Bank 6 



a http: //www-personal . umich. edu/~mejn/netdata/ 

b Trade and Production Database at http://www.worldbank.org 

As motivated in the previous subsection, the pa- 
rameters of the energy model are set to a = and 
r G{— 2, — 1.5, — 1}, with r = — 2 for networks with very 
nonuniform density (modularity > 0.5), and r = — 1 for 
networks with fairly uniform density (modularity < 0.3). 
The variation of r improves the readability by ensuring 
that vertices are not placed too closely, but otherwise 
does not affect the grouping of the vertices. 

Because the exact optimization of (a, r)-energy and 
modularity is computationally hard, the presented lay- 
outs and clusterings are not guaranteed to be optimal 
(except for the clustering of the Book Co-Purchase net- 
work [13), but are the best known representations. The 
Java program used for generating these representations 
is freely available [12]. It employs the Barnes-Hut algo- 
rithm for energy minimization, and agglomeration with 
multi-level refinement for modularity maximization (see 
Sec.lHCl). 

In the Karate Club network (Fig. [4]) , each vertex rep- 
resents a member of a karate club, and the edge weight 
of each vertex pair specifics the number of contexts (like 
university classes, bars, or karate tournaments) in which 
the two members interacted. The main vertex groups in 
the (0, — 1.5)-energy layout coincide with the four clus- 
ters of the modularity clustering, and the layout correctly 
indicates that joining triangles and circles into a single 
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cluster is almost as good as separating them (modularity 
0.435 vs. 0.445). The clustering and the layout both seg- 
regate the members who left the club after the instructor 
was fired (gray boxes), with the exception of one member 
who followed the instructor mainly to preserve his chance 
for the black belt. 




FIG. 4: (0, — 1.5)-energy layout and modularity clustering 
(represented by shapes) of the Karate Club network. The 
modularity of the clustering is 0.445. Gray boxes represent 
members who left the club after the instructor was fired. 

In the Book Co-Purchase network (Fig. [5]) , the vertices 
represent books on US politics, and edges of weight 1 con- 
nect books that were frequently purchased together. The 
clusters are generally well-separated in the layout; a few 
members of the smaller central clusters are placed closely 
to one of the two large clusters, which correctly indicates 
that they are densely connected with parts of these large 
clusters, and their assignment to a smaller cluster is a 
close decision. The clustering and the layout, especially 
their two main groups, conform well to Newman's classi- 
fication [36[ of the books as liberal (light gray), neutral 
(dark gray), or conservative (black); the layout is more 
suitable to represent the liberal-to-conservative ordering 
of the books. 




FIG. 5: (0, — 2)-energy layout and modularity clustering of 
the Book Co-Purchase network. The modularity is 0.527. 
Shades represent the classification as liberal (light gray), neu- 
tral (dark gray), or conservative (black). 

The Food Classification network (Fig. [6J represents the 
categorizations of 45 foods by 38 subjects of a psycholog- 
ical experiment, who were asked to sort the foods into as 
many categories as they wished based on perceived simi- 
larity. Each vertex represents a food, and the edge weight 
of each vertex pair is the number of subjects who assigned 
the corresponding foods to the same category. The clus- 
ters correspond well to groups in the layout, but the lay- 
out also indicates that the borders between some clusters 
are rather fuzzy (e.g., between snacks and sweets), that 
some clusters could be split into subclusters (e.g., fruits 
and vegetables), and that some foods cannot be clearly 



assigned to a single cluster (e.g., water, spaghetti). The 
grouping in both the clustering and the layout largely 
conforms to common food categories. 
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FIG. 6: (0, — 1.5)-energy layout and modularity clustering of 
the Food Classification network. The modularity of the clus- 
tering is 0.402. (The edges are elided to avoid clutter.) 



The World Trade network (Fig. [7|) models the trade 
between 66 countries in the year 1999. The vertices 
represent countries, and the edge weight of each vertex 
pair specifies the trade volume between the correspond- 
ing countries in US dollar. The clustering and the layout 
both group the countries of the three major economic ar- 
eas (East Asia / Australia, America, and Europe). The 
layout also reflects that countries like IRN and EGY can- 
not be clearly assigned to either the East Asian or the Eu- 
ropean group, and shows many smaller groups of closely 
interlocked countries like CHN and HKG, AUS and NZL, 
GBR and IRL, and the Nordic countries. 
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FIG. 7: (0, — l)-energy layout and modularity clustering of 
the World Trade network. The modularity of the clustering 
is 0.275. (The edges are elided to avoid clutter.) 
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VI. CONCLUSION 

As representations for the community structure of net- 
works, layouts subsume clusterings, thus quality mea- 
sures for layouts subsume quality measures for cluster- 
ings, and in fact prominent existing quality measures for 
layouts - namely, energy models based on the pairwise 
attraction and repulsion of vertices - subsume a promi- 
nent existing quality measure for clusterings - namely, 
the modularity measure of Newman and Girvan. This 
result has implications for the entire lifecyclc of quality 
measures: 

• Design: New and existing quality measures for lay- 
outs may be applied to clusterings and vice versa. 
For example, recent extensions of the modularity 
measure to directed networks [39[ and bipartite net- 
works can be directly generalized to energy 
models for layouts. 

• Evaluation: The evaluation of quality measures for 
clusterings and layouts can be partly unified, i.e., 
performed without distinguishing between cluster- 
ings and layouts. This has been demonstrated 



in [15J with a computation of the expected mea- 
surement value for networks with uniform expected 
density, a particularly important analysis technique 

Bill, S3. 

• Optimization: Components of clustering algo- 
rithms may be reused in layout algorithms and vice 
versa, for example the agglomeration (coarsening) 
phase of multi-level heuristics. Moreover, energy- 
based layout algorithms might serve as initial stage 
of clustering algorithms, similarly to eigenvector- 
based layout algorithms in existing approaches (see 
Sec HTO . 

• Application: Unified quality measures help to en- 
sure the consistency of clusterings and layouts (see 
Sec. |Vj) , which is crucial because both representa- 
tions are often used together. 
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